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ABSTRACT 



A variety of approaches have been suggested by which to 
assess the equality of population mean vectors under conditions of population 
covariance matrix homogeneity and heterogeneity. The nonrobustness of 
commonly used multivariate tests of means to population covariance matrix 
heterogeneity has been long documented. However, most studies have examined 
the performance characteristics of the statistical procedures under 
conditions of heterogeneous covariance structure by simulating heterogeneity 
in the structure of the variances. The only study that examined performance 
under heterogeneous covariance structure by simulating heterogeneity in the 
correlations concluded that there was little difference in the performance 
characteristics of standard multivariate means under conditions of variance 
homogeneity and correlation heterogeneity (T. Beasley and J. Sheehan, 1994) ; 
this study, however, only examined the performance of the procedures under 
equal sample sizes. This paper assesses the Type I error control of standard 
and alternative multivariate tests of means under homogeneous and 
heterogeneous correlation structure for a full range of sample size 
conditions. This paper focuses on the performance of multivariate tests on 
means in the two-group case. A Monte Carlo simulation experiment was 
conducted. Findings show that the "F" based on Hotelling's T-squared is 
robust to between groups differences in correlation matrices under equal and 
unequal sample size conditions as long as the difference in the magnitude of 
the correlations is not extremely large, no matter what the sample size 
conditions or the number of variables under study. Differences between the 
performance profiles of the standard multivariate means test procedure and 
available alternative procedures are discussed. An appendix provides a chart 
of observed percent bias on Type I error control of multivariate tests on 
means. (Contains 5 tables and 22 references.) (Author/SLD) 
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Abstract. Over the decades, a variety of approaches have been suggested by which to assess the equality of 
population mean vectors under the condition of population covariance matrix homogeneity and heterogeneity. The 
nonrobustness of commonly used multivariate tests of means to population covariance matrix heterogeneity has been 
long documented. However, most studies have examined the performance characteristics of the statistical procedures 
under conditions of heterogeneous covariance structure by simulating heterogeneity in the structure of the variances. 
The only study which examined performance under heterogeneous covariance structure by simulating heterogeneity 
in the correlations concluded that there was little difference in the performance characteristics of standard 
multivariate tests of means under conditions of variance homogeneity and correlation heterogeneity (Beasley & 
Sheehan, 1994); this study, however, only examined the performance of the procedures under equal sample sizes. 
The present paper assesses the Type I error control of standard and alternative multivariate tests of means under 
homogeneous and heterogenous correlation structure for a full range of sample size conditions. This paper focuses 
on the performance of multivariate tests on means in the two group case. 



Subject descriptors: Hotelling’s T-squared, multivariate tests on means, heterogeneity of covariance matrices, 
heterogeneity of correlation matrices, Type I error, MANOVA. 

Introduction 



Multivariate data analytic procedures are widely used by researchers in many different disciplines. 
Multivariate questions that are of interest to many researchers include comparisons of mean, correlation, and 
covariance structure between experimental and intact groups. Even though there are a wide variety of available data 
analytic techniques, the procedures which are most commonly used to compare the mean structure of several groups 
on several variables include parametric multivariate analysis of variance (MANOVA) and related discriminant 
function analysis techniques, where assumptions are that the observations in the comparison groups are obtained 
from multivariate normal populations with homogeneous covariance matrices. 

As is well known, not all statistical assumptions are realistic nor are all procedures robust to assumption 
violations. Thus, the question of the tenability of assumptions is an important point of consideration. While the 
question of whether data can be assumed to be obtained from multivariate normal populations has received much 
attention in recent years (e.g., Micceri, 1989), the question of whether data can be assumed to be obtained from 
populations with homogenous covariance structures has received far less attention. 

The question of the tenability of the assumption of homogenous covariance matrices is especially salient 
when testing differences between intact groups, but can also be an issue with experimental groups, with the 
heterogeneity of covariance matrices manifesting in two basic ways, (a) the variances of some or all of the variables 
are different, and/or (b) some or all of the variables are correlated differently in at least two of the groups under 
study. Though the tenability of the assumption of homogeneity of covariance matrices is not generally discussed, 
some conditions under which commonly used multivariate tests of means are nonrobust to population covariance 
matrix heterogeneity have been documented (e.g., Hakstian, Roed, Linn, 1979; Holloway and Dunn, 1967; Olson, 
1974). 

Importantly however, with the exception of Beasley and Sheehan (1994), most studies examining the 
performance characteristics of the multivariate tests on means under conditions of heterogeneous covariance 
structure simulate heterogeneity in the structure of the variances, not heterogeneity in the structure of the 
correlations. As such, the impact of heterogeneous patterns in the variable variances has been widely studied (Algina 
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& Oshima, 1990; Algina, Oshima, & Tang, 1991; Algina & Tang, 1988; Everitt, 1979; Hakstian, Roed, & Lind, 
1979; Holloway & Dunn, 1967; Hopkins & Clay, 1963; Kim, 1992; Mardia, 1971; Subrahmaniam & Subrahmaniam, 
1973; Yao, 1965); in contrast, the impact of heterogeneous correlation patterns has been studied very little. 

Over the decades, a variety of approaches have been suggested by which to assess the equality of population 
mean vectors under the condition of population covariance matrix homogeneity and heterogeneity. Test statistics 
which have been proposed for use under conditions of heterogeneity of covariance matrices include procedures 
suggested by James (1954), Yao (1965), Johansen (1980), Nel and van der Merwe (1986), and Kim (1992). Studies 
examining the performance of these alternative techniques have established the improved performance of these 
procedures over the standard parametric procedures under conditions of heterogeneous patterns in the variable 
variances; no study has examined the relative performance of these techniques under heterogeneous correlation 
patterns. 

The present study assesses the Type I error control of standard and alternative multivariate tests of means 
under homogenous and heterogenous correlation structure for a full range of sample size conditions. This paper 
focuses on the performance of multivariate tests on means in the two group case. The procedures under study and 
reviewed in the following section include the standard parametric F statistic based on Hotelling’s 7-squared, and 
alternative test statistics suggested by James (1954), Yao (1965), Johansen (1980), Nel and van der Merwe (1986), 
and Kim (1992), and recommended for use under conditions of heterogeneous covariance matrices. 



Test procedures examined 

Consider rij independent identically distributed observation vectors Xj,...,x n/ obtained from a p - 
dimensional multivariate normal population, with px 1 population mean vector |A; , non-singular px p population 
covariance matrix E,* and population correlation matrix P, . Let x/and S[ represent the sample mean and the 

sample covariance matrix for the ith group 0=1,2) and S = ( n\ +n 2 -2)“ 1 [(nt - l)Si + («2 -1)S2] • 

Many multivariate tests of the null hypothesis on the equality of two population mean vectors are 

formulated either as functions of the scalar quantities Tj? = (xj -x^'fnj^S + n^s] ! (xj-x 2 ) or 

Ty = (xj -x 2 ) / [nf 1 S 1 +nJ 1 S 2 ] (xj -x 2 ) , where statistics which are functions of rj are typically based on the 

2 

assumption of the equality of the two population covariance matrices, whereas statistics which are function of Ty 

are not typically based on the assumption of covariance matrix homogeneity. 

For a test of the null hypothesis on the equality of two population mean vectors, the standard parametric 
statistic (Hotelling, 1951), based on the assumption that observations are independent identically distributed from 
multivariate normal populations with homogeneous population covariance matrices, is 

Ffj ■ = [(«! + n 2 ~2)p] ( n \ + n 2 “ 1)7^ . This statistic has an F distribution with p and 



n\ + n 2 “ P ” 1 degrees of freedom. 

A wide variety of alternative test statistics have been proposed. The alternative approaches under 
consideration in this paper are procedures suggested by James (1954), Yao (1965), Johansen (1980), Nel and van der 
Merwe (1986), and Kim (1992). These procedures all assume that observations are independent identically 
distributed from multivariate normal populations; however they do not assume that the populations have 
homogeneous covariance matrices. 

For a test of the null hypothesis that two population mean vectors are equal, James (1954) expressed the 
critical value for Ty as a series of terms in descending order of magnitude. The 1 st order approximation of the 

critical value is given by c{A + cB) where c is the 1-a percentile point of the central chi-square distribution with p 

2 2 

degrees of freedom, A ( - =(n l )" 1 S l -, V= XA,- , A = 1 + (2p)~ ] X(n,- - l)“ , /r 2 (V“ 1 A l ) , and 

i=l i=l 

2 

B = [p(p+2)]~ l X (rt,- -l)“ 1 [rr(V“ 1 A l ) 2 +5fr 2 (V“ 1 A;)] . James’ 2nd order approximation to the critical value 

i=l 

is given by the sum of James’ 1 st order critical value and 
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2h 2 (a) = —\l- 



p-2 



^~( 2 Xa[^]+{x4 + Xi J/] 2 ) 



-Zv,- 2 ((2^4 +Xl)W]+XA^ 2 ) 

+ Zv, _2 ( 2(3^4 +^2 )[' IiIi '] + ( 5 ^4 + X2MM+U4 "*■ Xl )[*'] 3 ) 

-Z(v,v 7 ) (2^4 [/ 1/1 ji 7] + (2^4 + )[* 1 yi *1 > ] + (3^4 +^2X ,1,1 ^'L'] + ^*[ ,1 i] 2+ U4 +2:2)[«i;M;']) 
+ {x2~ i yLvf 2 (2x 4 \i\i\i]+{x4 + Z2M']) 



- (*2 - l)Z (v, v j ) (2*4 [i\ tl ;l j] + (x 4 + Xl )[‘l «i ;][;]) 

- jS(v/V;) (2(^4 - *2 )[' I «]+ [xA - l)['] 2 )( 2 Z4 U j] + (x4 + Xl\jf) 
-^Ivr 2 (2(4 Z6 + Xa + Xl)\iMi] +3 ( 2 X6 + X4)\}\i\i]+(x6 + X4 + X 2 W) 



Te^r 



32^ 8 [jI«I;I;]+8(2^8 +2x 6 +X4 +XlfaMj]+ l ^( 2 Xs + X6+ X^l^jfj] 

+ 4 Us -Xel^l^A+Kxs +Xe)b jf +4 U8 

+ 8 Us +X 6 +X 4 +X 2 )[*l j}i\j] + (x% +X 6 -X 4 -X 2 )I«] 2 [jf 



where c is the 1-a percentile point of the central chi-square distribution with p degrees of freedom, and 
i_1 " "" r -' 0/ " , ' 1 " 1 f " 1, [i] = rr(V- 1 A l ), [«-,y] = rr(V- 1 A i V- 1 A ; ), 



Xl =c(p) . Xls =Xl(s-\)\.P + 2 ( s - ] )] for S>1 > v i= n i 



t«,y,ifc] = rr(V _1 A i V 1 A 7 V _1 A t ),and = . 

2 

Yao (1965) suggested a test statistic based on a transformation of Ty . This test statistic is 

2 

FY=(Pf2)~hf2-P+l)Tu where / 2 “> = £(«,- -\)~'( Wi /rff , w,- = (x, -x 2 )'V -1 A,- V -1 (x, - x 2 ) . For 

i=l 

Yao’s Fy , critical values are obtained from the F distribution with p and fy - p + 1 degrees of freedom. 

—1 2 —1 

Johansen (1980) proposed the test statistic Fj 0 -c^ Ty where c\ = p + 2C-6C{p+ 1) and 

2 

C =5 £(«/ " ir^KV^A; ) 2 + tr 2 (V^A / )]. For a test of the null hypothesis, the reference distribution of 
i=l 



Johanson’s is the F distribution with p and p(p + 2)/3A degrees of freedom. 

The Nel and van der Merwe (1986) test statistic is F/y = (p/3)~^(/3 - /? + l)7jy where 

/3 =(frV^ +tr^V)]£(nf -l)~^(frA? +fr^A/) . Nel and van der Merwe’s Fjy is referred to the F distribution 
i=l 



with p and fy - p + 1 degrees of freedom. 

Kim (1992) suggested an alternative test statistic. This statistic is 
f K = (c 2'»/2) _1 (/2 -p + l)(*l -x 2 ) , A _1 (x 1 -x 2 ), where 

A = A, + r 2 A 2+ 2rA>'2(A^ Al Aj^,''2 A l«, r Ja,a 2 '| 1/<2P) . c 2 = £ A / 

7=1 / 7=1 
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m = 




^ Ly, Lj = (dj +l)^(dj^ + r)^, and dj is the 7 th eigenvalue of A^A 2 ^. Kim’s F^ 
7=1 



is 



referred to the F distribution with /? and /2 - p + 1 degrees of freedom. 



Relevant Monte Carlo research 

Numerous researchers (e.g., Algina & Oshima, 1990; Algina, Oshima, & Tang, 1991; Algina & Tang, 1988; 
Everitt, 1979; Hakstian, Roed, & Lind, 1979; Holloway & Dunn, 1967; Hopkins & Clay, 1963; Kim, 1992; Mardia, 
1971; Subrahmaniam & Subrahmaniam, 1973; Yao, 1965) have examined the performance of test statistics which 
enable the comparison of the mean vectors of two populations. Even though many studies have examined the 
performance of two-group multivariate tests on means under conditions of heterogeneous population covariance 
matrices, the studies examining the performance of the test statistics under heterogeneity of covariance matrices have 
simulated heterogeneity of the covariance matrices only by varying between groups the population variances of the 
underlying variables, not by varying between groups the population correlation structure of the underlying variables. 

In general the research on the two group multivariate test statistics on means have shown that the standard F 
based on Hotelling’s 7-squared is relatively robust under equal sample size conditions, however under unequal 
sample size conditions if the group with the smaller sample size has the smaller variances then the standard F 
procedure is conservative, and if the group with the smaller sample size has the larger variances then the standard F 
is liberal. Furthermore, the alternative procedures provide improved Type I error control in general (c.f., Coombs, 
Algina, and Oltman, 1996). 

Beasley and Sheehan (1994) conducted a study on the impact of homogeneous variances and heterogeneous 
covariances on standard MANOVA procedures, of which the standard parametric two-group Hotelling 7-squared 
procedure is a special case. For the conditions they examined, they determined that when the variances were equal 
the presence of unequal covariances did not impact the performance of the MANOVA procedures, that is, they found 
that the presence of heterogeneous correlation matrices did not impact the performance of the MANOVA 
procedures. Their study, however, only examined the performance of the MANOVA procedure under conditions of 
equal sample sizes. 

From the results of studies simulating only heterogeneity of variances, that MANOVA procedures are fairly 
robust to moderate heterogeneity of variances under equal sample sizes is well known; however, it is also well 
known that robustness does not obtain under unequal sample sizes. Thus, even though Beasley and Sheehan’s results 
indicate that MANOVA procedures are robust to heterogeneity of correlation structure under equal sample sizes, one 
might expect that when sample sizes are unequal, the MANOVA procedures will not perform well for comparisons 
of mean vectors between groups from populations with heterogeneous correlation matrices. 

Methods 



A Monte Carlo simulation experiment was conducted in order to compare the Type I error control of 
procedures available to assess whether the mean vectors for two groups are different in the population under 
conditions of heterogeneous correlation structure. The relative performance of the standard Hotelling 7-squared F 
statistic and alternative James 1 st order, James 2 nd order, Yao, Johansen, Nel, and Kim procedures were assessed. 

A stand-alone FORTRAN computer program, implementing the standard parametric and alternative tests, 
was written for this study. This program was written by the author; some IMSL routines were used. 

Design 

Data are generated from multivariate normal populations where all the variables have mean 0 and unit 
variance. The Type I error control of the standard and alternative statistics are examined under the following 
conditions. 

Population correlation structure. and P?: Data are from populations where variables are homogeneously 
intercorrelated. Data in group 1 are always generated from populations where variables are uncorrelated. Data in 
group 2 are generated from populations where the magnitude of the population correlations are .0, .1, .3, .5, .7, or .9. 

Number of variables, p: Data are from /?-variate multinormal populations, where p equals 4, 8 , or 12. 

Sample size, n± and n L :n-r . Data for group 1 are generated at specific ratios of sample size to number of 
variables; sample size for group f, n h include the conditions (/?+!), 2/?, 4/?, 10/?, 20/?, 40/?. Data for group 2 are 
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generated as a function of the sample size in group 1. The ratios of sample size for group 1 to sample size for group 
2, np.n 2 , are: 1:1, 1:2, 1:4, 2:1, 4:1. 

In typical simulations where observations are simulated to be independent identically distributed, a 
sequence of independent uniform variates (usually real- valued between 0 and 1) are first generated and then 
transformed in an appropriate way. The method of Kinderman and Ramage (1976) was used to generate data from 
multivariate normal distributions with the specified correlation structure. 

Hypotheses were tested at one level of nominal Type I error: a =.05. For each data set, the test statistics 
and critical values necessary for assessing the equality of mean vectors were calculated; the decisions for the 
procedures were recorded. 

The four factors P 2 , p, and n t :n 2 are fully crossed, resulting in a 6x3x6x5 factorial design. Each condition 
is replicated 10,000 times. 

Measures of performance 

Under each condition, the rejection frequency for each statistic is observed. For each condition, the number 
of rejections obtained for each test is tabulated and transformed into proportion rejected. Under each condition, the 
empirical rejection rate, a Empirical , for each statistic is observed. For each cell, the bias and percent bias results 

are obtained. For each cell, the percent bias ( B % ) of the observed empirical rejection rate from the expected rejection 
rate, ^Nominal* is obtained where = 100(ct Empirical “^Nominal ^Nominal • Factorial analysis of 

variance designs are used to determine the influence of the different factors on the pattern of decisions. Chi-squared 
goodness of fit values based on a normal approximation to the binomial are also computed; from this information, 
whether a procedure controls Type I error at the nominal level is assessed. Percent bias are also examined using the 
Bradley (1978) and Robey, and Barcikowski (1992) guidelines for what constitutes acceptable departures of 
empirical rejection rates from the nominal rejection rates. 

Results 



Empirical Type I error rate performance of each test statistic was assessed with 10,000 replications under 
every cell in the design. Bradley (1978) asserted that many researchers are unreasonably generous when defining 
acceptable departures of empirical alpha from the nominal level. He held that the departure of empirical alpha from 
the nominal level was “negligible” if empirical alpha was within a ±-^a according to a ‘fairly stringent criterion’, 

and a±ja according to the “most liberal criterion that [he] was able to take seriously” which in the remainder of 

his article he referred to as the ‘liberal criterion’. Robey and Barcikowski (1992) supplement the guidelines provided 
by Bradley for defining acceptable departures from the nominal level, providing an ‘intermediate criterion’ of 
a±\a , and a ‘very liberal criterion’ of a±^a . 

Appendix A details the percent bias (B % ) results of each of the procedures. Inspection of the results showed 
that no procedure consistently controls empirical Type I error rates within any of the Bradley, Robey and 
Barcikowski (BRB) criteria, and that patterns vary across levels of P 2 , p, ni and n 1 :n 2 . 

Assessing whether there is a significant difference between the Type I error control of the procedures under study 
and the pattern of influence of P2, p, nl and nl:n2. 

A factorial multivariate analysis of variance was conducted on the departures of empirical rejection rates 
from the nominal level; procedure type was parameterized as a repeated measures factor and P 2 , p, n { and n I :n 2 were 
between subjects factors. The multivariate test for procedure type yielded pc.OOl. All multivariate tests of interaction 
effects involving procedure type included in the model yielded pc.OOl; the five way interaction effect was not tested 
as there was only one summary empirical rejection rate per cell. Follow-up factorial analyses were conducted for 
each test procedure. All tests of main, two-way and three-way interaction effects yielded pc.OOl, with the exception 
of pxrij for Hotelling’s F (p=.031), px P 2 for Kim (p=.019), px n { x n t :n 2 for Hotelling’s F (p>.05), px n t x P 2 for 
Kim (p>.05), n t x n t :n 2 x P 2 for Jamesl , James2,Yao , Johanson , and Kim (p>.05). As such there is a significant 
difference in the Type I error control of the different statistical procedures, and this control varies across levels of P 2 , 
p, n 1 and n t :n 2 . 
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Assessing the departure of the empirical Type I error rate from the nominal level 

The Type I error control of the procedures was analyzed overall and across levels of P 2 , p , n { and n t :n 2 . 
Chi-square results and summary statistics on the percent bias of the different procedures are shown in Tables 1-5. 

Chi-square goodness of fit 

Chi-square goodness of fit values to assess the departure of the empirical Type I error rate from the nominal 
level were computed for every factorial cell for each test statistic. Chi-squares were summed to yield composite chi- 
square goodness of fit tests. 

Chi-square results show that overall none of the procedures controlled empirical Type I error rates at the 
nominal level across all the conditions examined. Analyses were also conducted to determine whether the statistical 
significance of the departures of the empirical Type I error rate from the nominal level varied at different levels of 
P 2 , p, nj and npn 2 . 

For the types of correlation pattern P 2 examined, the chi-square results show that the empirical rejection 
rates of the standard Hotelling T-squared F statistic were not significantly different from the nominal level when P/ 
equals P 2 , whereas, the empirical rejection rates of the alternative procedures were significantly different from the 
nominal level under this condition. For P / equal to P 2 , the magnitude of the chi-square results show that Hotelling’s 
F evidenced the best overall control of the empirical rejection rate within the nominal level, followed by Nel , 
James2 , Johanson, Kim , Yao, and James2. For all other conditions where P / did not equal P 2 , the empirical 
rejection rates of the standard and alternative procedures were significantly different the nominal level. For P/ 
unequal to P 2 , when the magnitude of the correlations in group 2 were all .1, Hotelling’s F evidenced the best overall 
control of the empirical rejection rate within the nominal level, followed by Nel , James2, Johanson , Kim , Yao , and 
James2. However when the magnitude of the correlations in group2 were all .3, the order from least to greatest 
overall departure from the nominal level was Nel, James2 t Johanson , Hotelling’s F , Kim , Yao, and Jamesl. For 
correlations of .5 in group 2, Nel showed the least overall departure from the nominal level, followed by James2, 
Kim, Johanson, Yao, Jamesl, and Hotelling’s F; for correlations of .7, Nel was followed by Kim, James2, Johanson, 
Yao, Jamesl, and Hotelling’s F\ and for correlations of .9, Kim was followed by Nel, James2, Yao, Johanson, 
Jamesl, and Hotelling’s F. 

Under the levels of p examined, the chi-square results indicate the empirical rejection rates for all the 
procedures were significantly different from the nominal level under every level of p. Though significantly different 
from the nominal level, at p equal 4, the magnitude of the chi-square results suggest Nel showed the least overall 
departure from the nominal level, followed by James2, Johanson, Kim, Yao, Jamesl , and then the standard Hotelling 
F. At p equal 8, Nel still showed the least overall departure from the nominal level, followed by James2, Kim, 
Johanson , Yao, Jamesl , and then the standard Hotelling F. At p equal 12, Nel continued to show the least overall 
departure from the nominal level, followed by Kim, James2, Johanson, Yao, Jamesl , and then the standard Hotelling 
F. 

The chi-square results indicate the empirical rejection rates of all the procedures were significantly different 
from the nominal level for m equal to (p+l), 2p, and 4/7, • however for nj equal to 10/7, 20/7, and 40/7, some 
procedures had empirical rejection rates that were not significantly different from the nominal level. Though 
significantly different from the nominal level, at n t equal to /?+l, the chi-square results indicate Nel showed the least 
overall departure from the nominal level followed by Kim, James2, Johanson, Yao, Hotelling’s F, and James2\ at n { 
equal to 2/7, the order from least to greatest overall departure from the nominal level was Kim , James2, Nel, 
Johanson, Yao, James2 , followed by Hotelling’s F\ and at n t equal to 4/7, James2 showed the least overall departure 
from the nominal level followed by Johanson, Yao, James2, Kim, Nel, and Hotelling’s F. At of n/ equal to 10/7, 
James2, Johanson, and Yao had empirical rejection rates that were not significantly different from the nominal level. 
At of n/ equal to 20/7, Jamesl , James2, Yao, and Johanson had empirical rejection rates that were not significantly 
different from the nominal level. At of n { equal to 40/7, Jamesl , James2, Yao, Johanson, and Nel had empirical 
rejection rates that were not significantly different from the nominal level. 

Under the levels of npn 2 examined, the chi-square results indicate empirical rejection rates were 
significantly different from the nominal level under every level of n / ; n 2 for every statistic. Though with an empirical 
rejection rate significantly different from the nominal level, at np.n 2 equal to 1, Nel showed the least overall 
departure from the nominal level, followed by Kim, Yao, Hotelling’s F, James2, Johanson, and Jamesl. At n } :n 2 
equal to 1:2, Kim showed the least overall departure from the nominal level, followed by Nel, James2, Yao, 
Johanson, Jamesl, and Hotelling’s F. At npn 2 equal to 1:4, Nel showed the least overall departure from the nominal 
level, followed by James2, Kim, Johanson, Yao, Jamesl, and Hotelling’s F. At npn 2 equal to 2:1, James2 showed 
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the least overall departure from the nominal level, followed by Yao, Kim, Johanson, Nel, Hotelling 's F, and Jamesl. 
At npn 2 equal to 4:1, Jamesl showed the least overall departure from the nominal level, followed by Kim „ 
Johanson, Jamesl, Yao, Hotelling's F, and Ate/. 

Bradley, Robey, and Barcikowski guidelines and percent bias 

According to the Bradley, Robey, and Barcikowski (BRB) guidelines for what constitutes acceptable levels 
of departure of empirical Type I error rates from the nominal level, procedures which control empirical rejection 
rates within a±-^a are described as providing “stringent” Type I error control, within a±\a as providing 

“intermediate” control, within a±ja as providing “liberal” control, and within ai|a as providing “very 

liberal” control. Judgments are based on whether procedures consistently provided control of empirical rejection 
rates across the conditions, i.e., whether they provided control within the level specified across every cell under 
consideration; as such, judgments are based on whether the minimum and maximum percent bias of a given 
procedure is within the BRB guidelines across the conditions under consideration. 

As indicated earlier, no procedure consistently satisfies the BRB critieria for acceptable Type I error control 
across all conditions, nor does any procedure consistently control empirical rejection rates within a±a . 

For the different types of correlation pattern P 2 examined, the F statistic based on the standard Hotelling T- 
squared F statistic consistently provided stringent Type I error control for the conditions where P; equals P 2 ; Nel 
controlled empirical rejection rates within a±a under this condition. For conditions where P; did not equal P 2 , 
with the magnitude of the population correlation coefficients in group 2 equal to .1, the standard Hotelling F statistic 
provided consistent control of the empirical rejection rate within the liberal criterion; Nel controlled empirical 
rejection rates within a±a under this condition. For the conditions where the magnitude of the population 
correlation coefficients in group 2 are all .3, .5, or .7, Nel consistently controlled empirical rejection rates within 
a±a . For the conditions where the magnitude of the population correlation coefficients in group 2 are .9, no 
procedure consistently controlled the empirical rejection rate within the BRB criteria for acceptable Type I error 
control or within a±a . 

At p equal to 4, no procedure consistently controlled empirical rejection rates within the BRB criteria; 
however, Nel did consistently control empirical rejection rates within a±a . At p equal to 8 or 12, no procedure 
controlled empirical rejection rates within the BRB critieria or a±a . 

Under the levels of n/ examined, no procedure consistently controls the empirical rejection rates within the 
BRB critieria for acceptable departures of empirical rejection rates from the nominal level or within a±a for n } 
equal to (p+ 1). For nj equal to 2 p, though Jamesl and Kim consistently control the empirical Type I error rate within 
a±a which none of the other procedures do. For n } equal to 4 p> Jamesl and Johanson consistently control the 
empirical rejection rate within the intermediate criterion, Jamesl and Yao control the empirical rejection rate within 
the liberal criterion, Kim provides control within the very liberal criterion, and Nel within a±a . For equal to 
10p, Jamesl, Jamesl, Yao, and Johanson control the empirical rejection rate within the intermediate criterion, and 
Nel and Kim provide control within the liberal criterion. For n I equal to 20/7, Jamesl, Jamesl , Yao, Johanson , Nel 
and Kim control the empirical rejection rate within the intermediate criterion. For nj equal to 40/7, Jamesl, Jamesl , 
Yao , and Johanson control the empirical rejection rate within the stringent criterion, and Nel and Kim provide 
control within the intermediate criterion. 

Under the levels of npn 2 examined, the only procedure which consistently controls the empirical rejection 
rates within any of the BRB critieria for acceptable departures of empirical rejection rates from the nominal level for 
n } :n 2 equal to 1:1 is Kim , though the Nel procedure does provide control within a±a . No procedure controls 
empirical Type I error rates within any of the BRB criteria for npn 2 equal to 1:2 or 1:4. However for n\:n 2 equal to 
2:1, Jamesl controls empirical rejection rates within the very liberal criterion, and the standard Hotelling F, Yao, and 
Nel control the empirical rejection rate within a±a . 

Conclusions 



The findings in this paper on the two group multivariate tests on means show that the F based on 
Hotelling’s T-squared is robust to between groups differences in correlation matrices under equal and unequal 
sample size conditions as long as the difference in the magnitude of the correlations is not extremely large, no matter 
what the sample size conditions or the number of variables under study. However under moderate to large between 
, group differences in the correlation structure of the variables under study, the standard Hotelling T-squared F 
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procedure does not yield empirical Type I error rates that are consistently close to the nominal level. Under moderate 
to large differences in the population correlation matrices, if sample sizes are equal, the magnitude of the differences 
is not extreme, and the number of variables under study is not particularly large, the standard F procedure performs 
quite well; however if sample sizes are unequal, no matter what the sample size or the sample size to number of 
variables ratio, if the group with the smaller sample size has the variables which are more strongly intercorrelated 
then the standard F procedure is conservative, and if the group with the smaller sample size has the variables which 
are more weakly intercorrelated then the standard F is liberal. 

The results of the current study on the performance of the Hotelling T-square F statistic are consistent with 
the literature on the heterogeneity of covariance matrices which simulated only heterogeneity of variances and not 
heterogeneity of the correlations. The literature on the impact of heterogeneity of variances indicates that when the 
larger groups have the variables with the larger variances and the smaller groups have the variables with the smaller 
variances, then the standard parametric procedures for multivariate tests on means are conservative; the literature 
also indicates that that when the larger groups have the variables with the smaller variances and the smaller groups 
have the variables with the larger variances, then the standard procedures for multivariate tests on means are liberal. 
Thus, when there is a positive relationship between sample size and the generalized variance of the groups, the 
standard procedures are conservative; and when there is a negative relationship, the procedures are liberal. For the 
conditions simulated in the present study, the generalized variance of a group with variables that are strongly 
intercorrelated is smaller than the generalized variance of a group with variables that are weakly intercorrelated. 
Thus, just as when only variance heterogeneity is simulated, when correlation heterogeneity is simulated, if there is a 
positive relationship between sample size and the generalized variance of the groups then the standard parametric 
multivariate means test procedure is conservative; and if there is a negative relationship then the procedure is liberal. 

Results showed clear differences between the performance profiles of the standard multivariate means test 
procedure and available alternative procedures. Unlike the Hotelling’s T-squared F procedure, the alternative 
procedures showed extremely good Type I error control at moderate to large sample sizes no matter what the number 
of variables under study, the magnitude of the between group differences in the correlation matrices, or the 
relationship between sample size and the generalized variance. Differences between the alternative procedures were 
mainly in terms of sample size requirements to yield acceptable Type I error control, with James2 and Johanson 
showing the fastest convergence to acceptable Type I error rates, followed by James V, Yao y Nel and Kim.. 
Importantly for researchers analyzing data from small sample research, none of the alternative procedures had 
. acceptable Type I error rates for the smallest level of sample size; under these conditions a researcher is well advised 
to have equal sample sizes and use the standard parametric techniques. However for the analysis of data sets where 
sample sizes are unequal and the sample size to number of variables ratio is not extremely small, alternative 
techniques are preferred. 
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Table 1. Overall chi-square( x , df=540) and percent bias ( B % ) results on Type I error control of multivariate tests 
on means 





x 1 


Min B% 


Max B% 




S B % 


Hotelling 


2291426.84 s 


-89 


1611 


96 


11.5 


James 1 


509488.00 s 


-10 


1056 


57 


5.2 


James2 


120736.97 s 


-12 


662 


22 


2.6 


Yao 


292519.99 s 


-46 


875 


31 


4.2 


Johanson 


241381.95 s 


-13 


864 


33 


3.7 


Nel 


45222.75 s 


-98 


405 


-9 


1.7 


Kim 


116189.53 s 


-83 


408 


8 


2.7 



Note: 



=p<.001 
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Table 2. Summary chi-square ( % , df= 90) and percent bias ( B % ) results on Type I error control of multivariate tests 
on means as a function of the magnitude of the correlations in P 2 



p 




* 2 


Min B % 


Max B % 


B % 


S B% 


.0 


Hotelling 


96.72 


-10 


10 


0 


.5 




James 1 


37567.84 a 


-6 


375 


45 


8.1 




James2 


4794.09 a 


-7 


142 


15 


3.0 




Yao 


24941.08 a 


-46 


440 


25 


7.2 




Johanson 


12962.38 a 


-8 


257 


23 


5.0 




Nel 


3663.97 a 


-98 


8 


-14 


2.6 




Kim 


19546.60 a 


-53 


390 


12 


6.7 


.1 


Hotelling 


363.66 a 


-9 


27 


3 


.9 




James 1 


39866.49 a 


-8 


392 


56 


8.4 




James2 


5386.21 a 


-8 


153 


15 


3.2 




Yao 


25482.77 a 


-45 


449 


25 


7.3 




Johanson 


14312.32 a 


-10 


269 


24 


5.2 




Nel 


3541.31 a 


-97 


8 


-14 


2.5 




Kim 


19297.53 a 


-52 


381 


12 


6.6 


.3 


Hotelling 


1 1510.92“ 


-28 


142 


22 


4.7 




James 1 


45978.06 a 


-7 


455 


48 


9.1 




James2 


6925.99“ 


-7 


194 


16 


3.7 




Yao 


31383.76“ 


-39 


513 


27 


8.2 




Johanson 


17573.93“ 


-8 


324 


25 


5.9 




Nel 


3648.57“ 


-92 


12 


-14 


2.5 




Kim 


19733.42“ 


-50 


388 


-11 


6.7 


•5 . 


Hotelling 


89728.38“ 


-57 


401 


64 


12.9 




James 1 


62635.72“ 


-10 


584 


52 


10.9 




James2 


11026.87“ 


-12 


272 


19 


4.7 




Yao 


43517.76“ 


-35 


633 


29 


9.7 




Johanson 


26424.89“ 


-13 


435 


29 


7.3 




Nel 


3771.16“ 


-84 


20 


-14 


2.6 




Kim 


20639.06“ 


-46 


408 


10 


6.9 


.7 


Hotelling 


448624.20“ 


-75 


872 


155 


28.2 




James 1 


98281.91“ 


-4 


739 


62 


13.8 




James2 


21492.31“ 


-8 


383 


25 


6.6 




Yao 


61932.73“ 


-43 


729 


34 


11.6 




Johanson 


45791.20“ 


-9 


562 


37 


9.7 




Nel 


4912.15“ 


-90 


67 


-8 


3.3 




Kim 


19751.95“ 


-57 


378 


6 


6.8 


.9 


Hotelling 


1741102.95“ 


-89 


1611 


334 


53.7 




James 1 


225157.95“ 


-9 


1056 


87 


21.2 




James2 


71111.50“ 


-9 


662 


42 


12.2 




Yao 


105261.87“ 


-12 


875 


43 


15.1 




Johanson 


124317.22“ 


-10 


864 


57 


16.1 




Nel 


25685.57“ 


-97 


405 


13 


7.7 




Kim 


17221.97“ 


-83 


318 


-5 


6.4 




Note: 



=p<.001 
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Table 3. Summary chi-square (x , df= 180) and percent bias ( B % ) results on Type I error control of multivariate 
tests on means as a function of p. 



p 




z 2 


Min B % 


Max B % 


B % 


$B % 


1 


Hotelling 


232890.75 a 


-42 


770 


55 


11.0 




James 1 


64049.50“ 


-10 


419 


41 


5.3 




James2 


12419.85“ 


-12 


232 


15 


2.5 




Yao 


32101.46“ 


-46 


321 


19 


4.1 




Johanson 


18455.82“ 


-13 


272 


19 


3.0 




Nel 


6465.34“ 


-86 


64 


-11 


1.8 




Kim 


1909.13“ 


-57 


240 


5 


3.3 


2 


Hotelling 


756793.92“ 


-75 


1326 


99 


19.8 




James 1 


163912.10“ 


-9 


784 


57 


8.8 




James2 


37092.50“ 


-9 


457 


22 


4.4 




Yao 


90050.82“ 


-38 


615 


31 


6.9 




Johanson 


72262.82“ 


-10 


592 


33 


6.1 




Nel 


13262.35“ 


-95 


220 


-9 


2.7 




Kim 


37866.03 “ 


-69 


329 


7 


4.7 


3 


Hotelling 


1301742.16“ 


-89 


1611 


134 


25.8 




James 1 


281526.41“ 


-8 


1056 


72 


11.7 




James2 


71224.62“ 


-8 


662 


29 


6.1 




Yao 


170367.71“ 


-34 


875 


42 


9.5 




Johanson 


150663.31" 


-10 


864 


46 


8.8 




Nel 


25495.05" 


-98 


405 


-5 


3.9 




Kim 


59226.37“ 


-83 


409 


11 


5.9 



Note: a =p<.001 
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Table 4. Summary chi-square ( x » df=90) and percent bias (Z? % ) results on Type I error control of multivariate tests 
on means as a function of n t 



n t 




* 2 


Min B % 


Max B % 


B % 


S B% 


p + 1 


Hotelling 


445543.61 3 


-75 


1611 


108 


30.4 




James 1 


469698.13 3 


12 


1056 


245 


30.0 




James2 


115987.92 3 


-3 


662 


106 


12.2 




Yao 


277468.24 3 


-46 


875 


138 


21.1 




Johanson 


228867.37° 


2 


864 


154 


16.6 




Nel 


31973.36° 


-98 


405 


-38 


7.7 




Kim 


109948.22° 


-57 


408 


85 


13.4 


2 p 


Hotelling 


408523.95 3 


-84 


1566 


102 


29.2 




James 1 


37298.15° 


4 


250 


43 


5.3 




James2 


4319.55° 


5 


95 


23 


2.1 




Yao 


14151.20° 


22 


134 


37 


4.3 




Johanson 


11862.23° 


-3 


160 


37 


3.5 




Nel 


10268.78° 


-79 


202 


-10 


4.8 




Kim 


2826.83 3 


-83 


43 


-7 


2.5 


4 P 


Hotelling 


375447.34° 


-85 


1534 


96 


28.1 




James 1 


2179.83° 


-3 


45 


18 


1.3 




James2 


174.33° 


-7 


15 


3 


.6 




Yao 


639.87° 


-10 


28 


7 


1.0 




Johanson 


397.85° 


-8 


25 


5 


.8 




Nel 


2310.25° 


-50 


76 


-2 


2.3 




Kim 


2227.86° 


-67 


6 


-16 


1.6 


10 p 


Hotelling 


359819.93° 


-86 


1513 


92 


27.6 




Jamesl 


154.94° 


-9 


15 


3 


.5 




James2 


98.11 


-12 


9 


0 


.5 




Yao 


105.94 


-11 


11 


1 


.5 




Johanson 


96.77 


-13 


9 


-1 


.5 




Nel 


422.69° 


-26 


27 


0 


1.0 




Kim 


764.15° 


-35 


7 


-9 


1.0 


20/7 


Hotelling 


352263.03° 


-89 


1519 


90 


27.3 




Jamesl 


72.55 


-8 


15 


1 


.4 




James2 


72.91 


-8 


13 


0 


.4 




Yao 


72.47 


-8 


13 


0 


.4 




Johanson 


76.94 


-10 


12 


-1 


.4 




Nel 


140.00° 


-45 


18 


0 


.6 




Kim 


294.36° 


-19 


6 


-5 


.6 


40/7 


Hotelling 


349828.98° 


-88 


1499 


90 


27.2 




Jamesl 


84.41 


-10 


9 


1 


.4 




James2 


84.14 


-10 


9 


0 


.4 




Yao 


82.28 


-10 


9 


0 


.4 




Johanson 


80.49 


-10 


9 


0 


.4 




Nel 


107.67 


-12 


13 


0 


.5 




Kim 


128.1 1 b 


-14 


9 


-2 


.5 



Note: a =p<.001 

b =p<.01 
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Table 5. Summary chi-square (x , 6^=108) and percent bias ( B % ) results on Type I error control of multivariate 
tests on means as a function of n/; n 2 . 



n t : n 2 




x 2 


Min B% 


Max B % 


B % 


S B% 


1:1 


Hotelling 


9828.24 a 


-8 


299 


14 


14.0 




Jamesl 


103486.62 s 


-8 


686 


66 


65.7 




James2 


19690.75 s 


-8 


360 


25 


24.6 




Yao 


3742.90 s 


-46 


191 


-1 


-1.5 




Johanson 


39412.60 s 


-10 


510 


34 


34.1 




Nel 


2349.02 s 


-48 


88 


-3 


-2.6 




Kim 


2790.05 s 


-57 


8 


-14 


-14.3 


1:2 


Hotelling 


416600.24 s 


-7 


1029 


159 


21.2 




Jamesl 


129294.32 s 


-9 


877 


70 


12.9 




James2 


31005.39 s 


-9 


504 


28 


6.6 




Yao 


56487.60 s 


-10 


677 


39 


8.9 




Johanson 


61388.84 s 


-9 


682 


41 


9.2 




Nel 


18432.67 s 


-72 


405 


9 


5.4 




Kim 


16627.12 s 


-66 


218 


13 


5.1 


1:4 


Hotelling 


1852183.37 s 


-10 


1611 


353 


43.4 




Jamesl 


251521.72 s 


-9 


1056 


103 


17.7 




James2 


67460.96 s 


-12 


662 


45 


9.6 




Yao 


226395.77 s 


-11 


875 


95 


17.0 




Johanson 


133506.68 s 


-13 


864 


67 


13.3 




Nel 


11354.13 s 


-98 


226 


-5 


4.3 




Kim 


91670.44 s 


-83 


408 


38 


11.7 


2:1 


Hotelling 


8459.33 s 


-89 


10 


-27 


2.7 




Jamesl 


21681.20 s 


-10 


218 


33 


5.0 




James2 


2240.68 s 


-10 


71 


10 


1.7 




Yao 


2287.83 s 


-12 


85 


9 


1.8 




Johanson 


6096.05 s 


-10 


133 


15 


2.8 




Nel 


7717.30 s 


-93 


8 


-24 


2.7 




Kim 


4581.60 s 


-45 


119 


7 


2.7 


4:1 


Hotelling 


4355.66 s 


-68 


10 


-18 


2.0 




Jamesl 


3504.13 s 


-9 


96 


13 


2.1 




James2 


339.19 s 


-9 


29 


3 


.7 




Yao 


3605.88 s 


-8 


101 


12 


2.2 




Johanson 


977.79 s 


-10 


55 


5 


1.2 




Nel 


5369.63 s 


-97 


8 


-19 


2.3 




Kim 


520.35 s 


-24 


16 


-4 


.9 



Note: a -p<.001 
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